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This study investigated the effect of language immersion in an English- 
speaking environment on the production of intonational features in L2 
English sentences. It was hypothesized that the Korean group who had 
been immersed in the English language as children would have 
intonation patterns more similar to native English speakers than a non- 
immersed group of Korean speakers, who shared otherwise similar 
experience and proficiency with English. Sixty subjects in three groups - 
20 Korean adults in the immersed group, 20 Korean adults in the non- 
immersed group, and 20 native English speakers (as a control group) - 
took part in the experiment. The immersed group was more native-like 
by having a steeper F0 declination tilt (a wider F0 range and a fast 
speech rate). However, the immersed group, like non-immersed group, 
exhibited difficulty in resetting F0 between adjacent phrases at boundary. 
These results suggest that the acquisition of second language (L2) 
intonation is affected by early immersion in an L2 environment, but the 
degree of intonational acquisition was shown to vary by subareas, in 
which some FO-related cues proved to be hard to acquire. 
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speech rate, declination tilt, boundary cues 


1 Introduction 

Intonation plays a crucial role in intelligibility. Intonation refers to the use of 
suprasegmental phonetic features to convey sentence-level pragmatic 
meanings in a linguistically structured way (Ladd, 1996). These structures are 
affected by characteristics of the speech as well as by subject variables. The 
aim of the study is to investigate how the effect of early immersion has 
influence on acquiring L2 intonation. For this investigation, intonational 
phonetic features such as overall F0 declination tilt and extent of boundary 
resetting were analyzed. 

Intonation is considered an important aspect of prosody affecting 
intelligibility in L2 speech production (e.g., Laures & Weismer, 1999; 
Maassen & Povel, 1985; Mennen, 2006). Even though intelligibility is related 
to both segmental (Jenkins, 2000; Munro & Denying, 2008) and prosodic 
features (Anderson-Hsieh & Koehler, 1988; Tajima, Port, & Dalby, 1997), 
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some researchers have reported that prosodic factors are more important than 
segmental factors for perceived foreign accent and intelligibility (e.g., 
Anderson-Hsieh & Johnson, 1992; Bradlow, Torretta, & Pisoni, 1996; James, 
1976; Tajima, Port, & Dalby, 1994). Since international features provide 
information for the interpretation of elements in a sentence as new versus old, 
salient versus weak, or foreground versus background, its function is crucial 
in acquiring the intelligible prosody of the second language (e.g., Morley, 
1992; Wennerstrom, 1991, 1994). 

L2 Intonational features are affected by various factors: the age of L2 
acquisition (Guion, et al., 2000; Tahta, et ah, 1981), language experience 
(Flege & Liu, 2001; Mennen, 2004; Trofimovich & Baker, 2006), the 
background of the native language (Archibald, 1995; Archibald, 1998; Davis 
& Kelly, 1997; Guion, et ah, 2004), and motivation (Conrad, 1991; Moyer, 
1999). Among these factors, transfer from the native language has been 
suggested to play a key role in forming L2 intonational production. Delattre 
(1963) claimed that speakers tended to impose their native intonation patterns 
on their second language. In his study, a French learner of English produced 
different intonation from an English native due to the use of French 
intonation patterns in the English production. Wennerstrom (1994) 
investigated the English prosody produced by native Spanish, Japanese, and 
Thai speakers and reported that while the native speakers made significant 
use of pitch contrasts to signal focus on the items measured, the non-native 
speakers did not consistently use pitch to signal meaning contrasts in many of 
the same environments. Aoyama and Guion (2007) reported that there were 
considerable differences between English spoken by L2 Japanese speakers 
and native English speakers, both in duration of linguistic units and overall 
F0 range and suggested that the prosodic differences stemmed from LI 
background. These studies draw the conclusion that the effect of immersion 
experience decreases the degree of the LI transfer for early children as well 
as adult learners. 

To date, how the experience of early immersion has influences on L2 
intonational acquisition was rarely researched, comparing non-immersed 
groups with almost equal L2 proficiency. Some longitudinal studies, however, 
report contradictive results (e.g., Trofimovich and Baker, 2006). Grover et al. 
(1987) reported that 10-year-old French immersion subjects produced the 
French intonation more natively than 16-years old students. Also Shen (1990) 
reported that some Chinese learners of French with 7 to 14 years were judged 
to produce French interrogative intonation with significant accuracy. On the 
contrary, other studies suggest opposite results for the effect of early 
immersion. Lepetit (1987) found that Japanese learners of French with 
different ages and learning experience did not exert meaningful intonation 
contours. 

This variability of the results in L2 learners’ acquisition of intonation 
might be attributed to methodological problems. Most previous studies of L2 
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intonation acquisition investigated the age effect for longitudinal variation 
(see Trofimovich and Baker, 2006). These studies assume the background 
hypothesis that L2 suprasegmental factors could be developed naturally over 
time and experience in an L2 settings (e.g., Guion et al, 2001). However, 
some papers reports that L2 acquisition of suprasegmentals by immersion 
effect could be different depending on the environment of ESL or EFL 
(Aoyama, Guion, Flege, Tsuneo, Akahane-Yamada, 2008; Kang, Guion, Rhee, 
and Ahn 2011). Kang, Guion, Rhee, and Ahn (2011) reported that different 
developmental directions of L2 suprasegmentals could be found for Korean 
learners of English, examining both groups of immersed and non-immersed 
L2 learners. What specific intonational factors are influenced by the 
immersion, however, still remains unsolved. 

Intonation can be viewed in two ways: phonetics and phonology. 
Phonetic approach is the acoustic correlate of pitch represented as F0 
contours which are plots of F0 against time, while phonological 
representations are used to describe discrete expression of pitch contours. In 
the present study, we take up intonation in its phonetic levels, and thus focus 
on size of declination slope in whole sentences as well as the resetting of 
slope at boundary, since F0 declination has been claimed to be very much LI 
dependent (e.g., Thosen, 1984) as well as speaking style-dependent (e.g., 
Umeda, 1982). In its formation, declination is greatly affected by a reset 
which exerts coincidence with boundary cues of various linguistic units. 

The aim of the study investigates the influence of immersion during 
childhood on the acquisition of L2 intonation. The goal of the study is to 
extend our understanding of factors influencing the acquisition of intonation, 
which importantly affects intelligibility of L2 speech production. As the 
immersed group received massive exposure to spoken English as children, 
we predicted that their intonation patterns would be more native-like than the 
non-immersed group who acquired English for a similar period of time and 
had similar proficiency on standardized English tests. 

2 LI Intonational Structure 

As a universal linguistic feature, there is a general tendency for F0 to begin 
on a moderate frequency, move to a higher frequency, and then lower across 
the sentence (Pike, 1945). The intonation of most languages can be 
characterized by the declination theory, in which the declining F0 that 
gradually falls throughout the course of a sentence represents a linguistically 
salient aspect of the F0 contour. Declination is found in English (Liberman et 
al., 1985; Maeda, 1976; Pierrehumbert, 1980), Dutch (Cohen, Collier, and 
t’Hart, 1982; Strik and Boves, 1995; Thorsen, 1985; van Heuven, Vincent, & 
Haan, 2000), French (Beyssade & Marandin, 2006; Hirst & Cristo, 1998), 
Japanese (Fujisaki & Sudo, 1971; Pierrehumbert & Beckman, 1988), and 
Finnish (Valimaa-Blum, 1993). 
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The degree and type of the FO movement, however, may vary by 
languages and linguistic structures which are affected by the type of 
intonational structures they possess. Korean has a different intonational 
structure from English. Korean has two prosodic units above the prosodic 
word: the intonational phrase (IP) and the accentual phrase (AP) (Jun, 1998; 
Jim, 2005). An IP is defined by phrase final lengthening as the form of a 
boundary tone and also is the highest prosodic unit defined by intonation, 
including one or more APs. An IP boundary tone has a falling FO pattern in 
declarative sentences. APs in Korean do not have any pitch accents 
associated with stressed syllables in their domain and also lack the phrase 
accent which occurs at the end of the intermediate phrase of English. APs are 
associated with tonal patterns; the basic type being LHLH, though 15 tonal 
patterns have been described, based on the number of syllables and segmental 
make-up of the AP 1 . In Korean, however, it is difficult to find the distinctive 
declination tilt because the narrow pitch in an IP-final part is not applied. 
Since there is only one accented syllable usually in the IP-final syllable, this 
makes it hard to form a top-line. 

English, unlike Korean, is a stress language in which one syllable is 
stressed within the prosodic foot. The stressed syllable tends to have a greater 
duration, higher pitch, and more complicated contour of FO than the 
unstressed syllables and serves as the syllable on which pitch accents are 
realized. English has three prosodic units above the prosodic foot: the 
intonation phrase (IP), the intermediate phrase (iP), and accentual group 
(AG) (Wells et ah, 2004). An IP is the highest prosodic unit defined by 
intonation and may contain one or more iPs. It has final lengthening with the 
final falling FO in the case of declarative sentences. An iP has to contain at 
least one pitch accent with either falling or rising FO. Each iP consists of one 
or more AGs, defined as the domain for a pitch accent configuration. Also 
AGs contain one or more Feet, each of which comprises a strong initial 
syllable and following weak syllables. 

The LI intonational structure has some influences on L2 intelligible 
speaking. Jilka (2000) reported that German/American bilinguals used a 
wider range of pitch in their American English than in German. The F0 
declination patterns are also affected by phrase types. The English IP has a 
falling F0 contour as a terminal or final signal, while the iP, as a non-terminal 
or continuity signal, has various patterns of F0 contour: LHL, HLL, HL, LL, 
LLH, H, L, etc. Lieberman (1967) suggested that variation could be found in 
the non-terminal part of the fundamental frequency contour, in that the lower 
unit of the iP might have various F0 contours different from higher units of 
the IP. This implies that such difference is due to the intonational differences 
in terminal or non-terminal sentences. 


1 Jun (2000) reports the 15 types of phrasal tones in Korean: LH, LHH, HLH, HH, 
HL, LHL, HHL, HLL, LL, HHLH, LHLH, LLH, HHLH. 
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The study of intonation involves the analysis of internal structure and 
shape of intonational contour. In this study, the internal structure of 
intonational phrases in L2 English will be investigated. The aim of the study 
presented here is to determine how and to what extent Korean learners of 
English produce native-like intonational patterns and the effect of immersion 
on that production involves in shaping L2 intonation. In this pursuit, we 
examine the upper-lines and lower-lines of the FO contour as an estimate of 
FO declination as well as duration and mean FO at the boundary and FO 
difference and duration between two phrases as a reset of the intonation. 

3 Methodology 

3.1 Participants 

The data were collected from 60 adult participants. None reported being 
diagnosed with a language or speech disorder. The participants were divided 
into three groups of 20 each: IS (Immersed Speakers) of male Koreans 
learning English; NIS (Non-Immersed Speakers) of male Koreans learning 
English; and NE (Native English Speakers) (20 males). The characteristics of 
participants in the three groups are presented in Table 1. Most of the NE 
participants were students of the University of Oregon. They did not speak 
any language other than American English on a daily basis when they took 
part in the experiment. The IS subjects were selected based on early 
immersion experience in an English-speaking country. The ISs had learned 
English in the U.S. or Canada during elementary or secondary school 
(ranging from 3 to 6 years of immersion duration), after which time they 
returned to live in Korea 2 . At the time of the experiment, they were all 
university students in Korea who were majoring in English, or international 
studies and related fields at a private university in Seoul. Korean NISs were 
students majoring in the same disciplines at the same university as Korean 
ISs. The students in both groups had begun learning English in their home 
country. South Korea, starting from 3 rd grade of the public elementary 
schools. Most of them studied English for six hours a week in both schools 
and private institutions in which native Korean speakers served as English 
teachers. 3 


2 

At the time of the experiment, some top-ranked universities in Korea where 
some subjects attended had a special entrance permit for applicants who had re 
sided for several years abroad. This implies that the academic abilities may not 
be equal between the two Korean groups because some IS members didn’t tak 
e the same university entrance exams as the NIS group. 

Three subjects of NIS had experience learning English from native English 
speakers in private institutions during secondary education. At the time of experiment, 
half of NIS members attended English classes instructed by English native speakers at 
their universities 
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As we wished to study the effect of immersion on intonation in L2 speech 
production, we sought to control the level of English proficiency between the 
two groups. In that way, differences in production could be more likely 
attributed to an effect of immersion than overall English proficiency. We used 
universal standardized tests to select subjects who had similar English 
academic proficiency between the two Korean groups. ITP TOEFL consisted 
of a reading comprehension, structure and written comprehension, and 
reading comprehension, and the maximum possible score was 677 at the time 
of experiment. This test was chosen because it was designed to test English 
proficiency as an academic basis and it is also known to be a reliable tool to 
measure English proficiency. The test was administered to all participants in 
the two Korean groups. The immersed group performed at 79% accuracy 
(534/677), with a mean of 518.17 and a standard deviation of 23.91 in the 
reading section, a mean of 562.16 and a standard deviation of 43.37 in the 
listening section, and a mean of 527.92 and a standard deviation of 35.2 in 
the structure and written expression. Then we identified similar NISs from 
a larger sample. The 20 NISs who performed similarly to the ISs were 
selected. These NISs performed at 77% accuracy (521/677), with a mean of 
538.33 and a standard deviation of 28.55 in the reading section, a mean of 
516.58 and a standard deviation of 31.87 in the listening section, and a mean 
of 527.92 and a standard deviation of 35.2 in the structure and written 
expression. An independent t-test for the two Korean groups, performed on 
the proficiency score test, revealed little difference between the two groups (t 
= .818, df=39, p > .05). There was no significant difference between the two 
Korean groups with regard to English level. Table 1 presents participants’ 
information including age, the number of years they had studied English, and 
the number of years they spent in America. 


Table 1. Subjects’Information 


Group 

Age 

LOR 

LOE 

ITP- 

TOEFL® 

Number 

NE 

21.3 

- 

- 

- 

20m 

IS 

21.1 

4.7 

10.3 

534 

20m 

NIS 

23.2 

- 

10.2 

521 

20m 


LOR: length of residence in North America (year); LOE: length of learning English 
(year); ITP TOEFL: Institutional testing program TOEFL; m: male 


3.2 Materials and procedures 

All participants were recorded individually in a quiet room using a portable 
digital recorder. All 60 subjects heard the same recorded stimuli in the same 
order and were recorded using the same equipment. The elicitation procedure 
allowed for the collection of fluently produced sentences without the need for 
reading, while also minimizing the likelihood of mimicry. 15 declarative 


6 






Language Immersion on Second Language Intonation 


sentences and 1 paragraph were used (see Appendix). 

The presented English written lists shown in Appendix were presented to 
the subjects as a separated methods. For 15 sentences, they were encouraged 
to be modeled aurally in short dialogues using prerecorded stimuli. For 
example, the test sentence “I closed the door and waited for the bus” was 
elicited using these materials: 

PC monitor: What did you do? (pause) 

PC monitor: I closed the door and waited for the bus. (pause) 

Interviewer: What did you do? (longer pause) 

After hearing a question and a response through computer monitor, 
followed by the same question a second time, the subjects were asked to 
repeat the model sentence (i.e., what was said by interviewer). The delay 
between the model and its repetition as well as the intervening speech 
material were expected to prepare for the speaking. For 1 paragraph which 
includes longer sentences, the whole contents of the story were presented on 
the monitor and subjects were encouraged to read the story. The reading of 
the paragraph was used only in analyzing the reset cues between intonational 
phrases. Before they produced the sentences, it was confirmed that they knew 
what the sentences meant, and that they knew how to pronounce them. Also, 
Korean subjects were given 30 minutes to practice the sentences before the 
experiment. The sounds were recorded with a Marantz PMD 650 using a 
Shure SM 10A microphone, digitalized at 44.05 kHz and 16 bit resolution. 

3.3 Intonational measurements 

English sentences were used to evaluate the prosody of each group. Several 
acoustic measurements of fundamental frequency (in Hertz) and duration (in 
milliseconds) were made. Duration and fundamental frequency were 
measured using a waveform display with a time-locked wideband 
spectrogram with the software PRAAT (5.1.17). All acoustic cues were 
measured from the initial acoustic signal in both the wavefonn and the 
spectrogram to the final acoustic cues of the boundary such as burst or 
spectral cues (Kent and Read, 2003; Ladefoged, 2001). Measures for F0 
declination tilt were made in 15 sentences. For the presented paragraph, the 
measures of pause duration and F0 difference between adjacent phrases for 
F0 resetting were made. 

For 15 sentences, the F0 was measured at the onset of the phrase, the 
absolute maximum point of the F0 peak, the absolute minimum point of the 
F0 valley, the local maximum point of the final F0 peak, the local minimum 
point of the final F0 valley, and the F0 at phrase offset. Thus, FOs and times 
in six points of a sentence were collected and calculated to form slope line, in 
which F0 range forms x-axes and time duration (otherwise speech rate) 
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shapes y-axes. The declination tilt y was computed as follows. 



where Af is the range difference of fO declination over durational time. 


Based on the formula, the intonation contours for the upper-line and 
lower-line were determined in mixed fashions of those in Cohen et al. (1982), 
Thosen (1984), and Fujisaki and Sudo (1971). In this study, the adopted 
method provides information to compare the size of FO slope and duration 
across the three groups 4 . The upper-line connects the first maximum peak of 
FO appearing in the initial part of the sentence to the final peak FO of the 
utterance, while the lower-line connects the initial minimum point of FO to 
the final valley point of FO in the sentences. The formulas were as follows: 

Upper - line 

_ the initial peak of FO - the boundary peak of FO 
duration * 100 


Lower - line 

_ the initial valley of FO - the boundary valley of FO 
duration * 100 

If the slope approximates 0, a level intonation between the two measured 
points of F0 is indicated. If the slope has a negative value approaching -1, it 
means that the initial peak point of F0 is higher than the final peak point of 
F0. On the contrary, in the case of a positive value approaching +1, it means 
that the final peak point of F0 is higher than the initial peak point of F0 
(Positive values usually appear in the interrogative sentences.). Several 
features were measured: 


4 Lieberman et al. (1985) points out some subjectivity resulting from eye-fitting 
procedures, criticizing the Maeda (1976). However, this study adopts the methods of 
Cohen et al. (1982) and Thosen (1984) because they give us good information about 
intonation-related parameters for group comparisons. 
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Y-axe features 

FO values at onset and offset positions 

The minimum and maximum values of FO 

FO range at the first peak, midpeak, and final peak. 

X-axe features 

The durational times to the minimum and maximum values of FO 
The durational times between onset and offset position 

Boundary tone features 

The duration at boundary foot 
Mean FO at boundary foot 

Reset features 

FO difference between phrases 

Y-features\ The declination tilt was measured using two cues: FO range as 
a Y-axe, and speech rate as an X-axe. The FO range, one of a Y-features, is 
known to be an indicator of English proficiency (e.g., Backman, 1979; 
Willems, 1982). Generally, a lower proficiency with English as a second 
language is related to a narrower FO range. In this study, the range was 
measured from the highest point to the lowest point of the fundamental 
frequency in three areas: overall range of FO across the phrase; FO range in 
the maximum point of FO mostly occurred in the initial part of the phrase; 
and FO range at the final stressed syllable in the final foot of the phrase. We 
used the FO tracing generated by Praat to determine peaks and troughs. FO 
was also calculated from the duration measurements of individual cycles in 
the waveform as a supplementary check on the accuracy of the FO tracker. 

X-feature: As an X-axe, speech rate has proved to be a good indicator of 
the second language proficiency (e.g., Derwing & Munro, 1997; Guion et al., 
2000). In this study, the speech rate is operationalized as duration measured 
from the initial acoustic signal of the phrase in both the waveform and the 
spectrograms to the final acoustic or spectral cues of the phrase boundary. 

Boundary features'. Final strengthening at boundary is realized in prosodic 
domains (e.g., mora, syllable, foot, or prosodic word) at the end of the phrase 
in the form of longer duration (Beckman & Edwards, 1991; de Pijper & 
Sanderman, 1994; Wightman et al., 1992), strengthening (Fourgeron & 
Keating, 1997), and alternation of the degree of overlap with adjacent 
segments (Byrd & Saltzman, 1998). This study measured the duration and 
mean FO in the phrase-final syllable because the final lengthening and final 
strengthening as a form of local peak FO or lengthening as a form of longer 
duration occurs in this part. For example, “day” in “My brother is coming on 
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Friday” was measured. 

Reset feature: Reset at boundary elements are analyzed. In the sentences 
containing two intermediate phrases, the conjunction between the first non¬ 
terminal and the second terminal phrase was also assessed. The pause 
duration between the two phrases was measured. Additionally, the FO at the 
final point of the preceding phrase was compared to the FO at the initial point 
of the following phrase. 

These measures were analyzed with Repeated Measures of Analyses of 
Variance (RM ANOVAs) which were conducted for statistical evaluation of 
the groups with the following parameters: Dependent variables of 
fundamental frequency, speech rate (duration), boundary cues, and the 
declination tilt measures were examined by the factor of Group (three levels: 
NE, IS, NIS). The repeated measure of phrases was used in order to consider 
the individual variation (each sentence by each speaker) along with within 
group variation (FO range and speech rate for declination tilt along with FO 
difference and pause duration for a reset). Repeated measures were used in 
order to account for within speaker variance in pronunciation. A repeated- 
measures design is able to factor out some of the variation that occurs within 
individuals. 

4 Data Analysis 

Three analyses were performed. The first analysis examined the extent to 
which the learners were able to produce L2 intonation intelligibly, as 
measured by overall intelligible ratings. In this analysis, the sentences spoken 
by the subjects were presented to twenty native English raters for evaluation. 
Then, the judgments were compared across both groups of L2 Korean 
learners and the native English speaker group. 

The second analysis examined the extent to which the learners were able 
to accurately produce specific intonations. The results of the acoustic 
measurements obtained were compared. The purpose of the experiment was 
to analyze to what extent each measurement has been affected over both LI 
and immersion. The final analysis extended the findings of the first two by 
using a multiple regression procedure to investigate how the learners’ 
production of specific intonational features contributed to native listeners’ 
intelligible judgments in the L2 speech. 

4.1 Ratings of intelligibility 

4.1.1 Ratings and raters 

Samples from paragraph recordings ensured that the content was held 
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relatively constant across speakers. The forty samples (twenty for 20 IS 
speakers and twenty for 20 NIS speakers) were randomized and presented to 
the native English raters using a loud speaker. The total of twenty native¬ 
speaking English listeners (twelve males and eight females; age range 20-26 
years, M=23) were recruited to evaluate the intelligibility of the L2 speakers 
using a 9-point Likert scales (from 1 = no intelligible speaking to 9 = 
extremely native-like intelligible speaking). The experimental sentences were 
presented to a group of ten native speakers of English for intelligible 
judgment. All of the raters were native English speakers who had some 
experience in teaching English in Korea. All raters reported normal hearing. 

For the judgment, the raters listened to some of the paragraph samples 
before rating each speaker. The adaptation of 9-point Lickert scale follows 
the study of Southwood & Flege (1999) that a 9 or 11-point scale is the most 
appropriate rating scale to evaluate L2 speech samples for the degree of 
intelligible speech. The raters were encouraged to use the entire scale and to 
guess if they were unsure. After confirming their pre-rating tests, they started 
rating the speech separately. 

5 Results 

The first purpose of the present study was to test the hypothesis that 
immersed learners were more likely to produce intelligible L2 intonation. The 
ratings of the samples indicated that most of raters kept the reliable results. 
The dependent variable in this analysis was the mean of intelligible ratings 
calculated by averaging the twenty English listeners’ ratings on the forty 
Korean subjects. The intra-class correlation coefficient was used to measure 
the degree of inter-rater reliability for each group of raters’ evaluation of the 
subjects’ speaking. The raters’ coefficient was highly correlated, r(20) = 0.96, 
p <.0001. These results indicated the high levels of agreements among all the 
native raters. 

Figure 1 presents the mean scores of fluency ratings obtained for both 
groups of Korean. The scores for both Korean groups are quite different, 
ranging from 3.0 to 6.0 out of 9. There was a difference in the effects of 
immersion. In Figure 1, higher mean ratings were obtained for the immersed 
Korean bilinguals. The difference on the intelligible ratings was significant 
(immersed group = 5.69, non-immersed group 4.12 ,p <.0001). 

The obtained ratings were submitted to an independent t-test with both 
Korean groups. This analysis revealed a significant group difference (t = 
24.236, df = 39. P < .0001). The analysis indicates that immersion experience 
in childhood has significant influences on determining the patterns of L2 
intelligible speech as a native-like manner. 
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group 


Figure 1. Group means for intelligible ratings (±1 SE) for both Korean 
groups 

5.1 Production experiment 

The English phrases were analyzed to investigate differences among the three 
groups. Table 2 presents the mean values and standard deviation of the FO 
range (overall, initial, and final), speech rate, mean FO and duration of the 
final foot and slope patterns of upper- and lower-lines. As seen in Table (2) 
and Figure (3), there were group differences for FO range, speech rate, and 
mean FO and duration of the phrase-final foot. 

Table 2. Mean and Standard Deviation of Parameters in the Intonational Phrase 


Measure 


NE 

IS 

NIS 

Slope pattern 

Upper-line 

-0.72 (1.08) 

-0.59 (0.43) 

-0.36 (0.34) 


Lower-line 

-0.38 (0.34) 

-0.31 (0.28) 

-0.22 (0.21) 

FO range (Flz) 

Overall Range 

120 (55) 

116(42) 

94 (43) 


Initial range 

60 (32) 

56 (42) 

47 (37) 


Final range 

31 (31) 

35 (40) 

37 (33) 

Speech rate (S) 


1.71 (0.3) 

2.00 (0.28) 

2.41 (0.45) 

Phrase-final 

Mean FO (Hz) 

115(35) 

135(31) 

149 (40) 

foot 

Duration (S) 

0.25 (0.10) 

0.27 (0.11) 

0.29 (0.11) 

Reset cues 

FO dif. (Hz) 

15.45 (28) 

4.09(17) 

3.66 (21) 


Pause duration(S) 

0.15 (0.14) 

0.17 (0.14) 

0.36 (0.26) 


5.2 Declination tilt 

For the intonational slope, the correlation between upper-line and lower-line 
was significant for the three subject groups pooled, r = 0.317 (p < .01). This 
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tilt implied that the two lines were closely related. However, as can be seen in 
Figure 4, the groups diverged most in terms of the upper-line; the NE group 
had the steepest slope, followed by the IS and then the NIS groups. 

The RM ANOVA confirmed that there was a significant effect of group 
for upper-lines, 7 7 (2,769) = 26.417, p < .001. Tukey’s tests (p < .05) revealed 
that the mean value for the slope of the upper-lines was the steepest for the 
NE group, intermediate for IS, and the least steep for the NIS group. Results 
of the RM analysis of variance returned significant effects for the lower-lines 
as well, 7 7 (2,769) = 23.711,/) < .001. Tukey’s tests (p < .05) revealed that the 
mean value for the slope of the lower-lines is steeper for the NE group, 
intermediate for IS, and the least steep for the NIS group. The results are 
summarized in Figures 2, in which the F0 contour presented here is the 
sentence of “I closed door and waited for the bus”. 





Figure 2. F0 declination tilt of three groups 
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In Figure 2, NE group shows comparatively steep slopes of the upper-line 
and lower-line (-0.72 and -0.38), wider F0 range of 120 HZ for the entire 
sentence, 60 Hz in the initial part and 31 Hz in the final foot, and sentence 
duration of 1.71 seconds. The IS group is between NE and NIS group. They 
show slopes of the upper-line and lower-line (-0.59 and -0.31), mean F0 
range of 116 Hz in the entire sentence, 56 Hz in the initial part and 35 Hz in 
the final foot, and sentence duration of 2.00 seconds. Finally NIS group 
shows gentle slopes of the upper-line and lower-line (-0.36 and -0.22), 
narrower F0 range of 94 Hz in the entire sentence, 47 Hz in the initial part 
and 37 Hz in the final foot, and sentence duration of 2.41 seconds. 

F0 range 

For the F0 range as an x-axe, the RM analysis of variance confirmed that 
there was a significant effect of group on Overall F0 range, F( 2,899) = 
21.967, p <.001. Tukey’s tests (p < .05) revealed that the F0 range was 
smaller for the NIS group than the NE and IS groups. Figure 3 (a) presents 
the F0 range produced by the three groups. The results showed that members 
of the IS group produced differences in the F0 range similar to the native 
speakers. However, members of the NIS group showed a comparatively 
smaller range in F0. This result supports the proposal that more fluent 
learners of English as a second language have a wider F0 range when 
speaking English than less fluent learners (e.g., Bradlow et ah, 1996; Mennen, 
2006). 

A narrower F0 range could be evidence of the influence of the native 
language (Scherer, 2000; Van Benzooijen, 1995), or reflect a lack of 
proficiency in the second language (Backman, 1979; Willems, 1982). It is 
unclear whether the narrower F0 range of the NIS group results from the 
influence of Korean or from NISs’ uncertainty about English pronunciation. 
However, it is noteworthy that the F0 range of Koreans is only 70% of the 
English F0 range, which has more F0 variation. The F0 range for the IS 
group closely approximated the range the NE group, perhaps showing less LI 
interference than the NIS group. 

An interesting observation about the F0 range is that, although all groups 
have a wider range in the initial part of the sentence and a narrower range in 
the final syllable of the sentence, the degree is different. NEs’ initial range is 
almost two times larger than the final word, but the NISs’ range is almost 
level between the two measured areas for F0 range, which produces a 
relatively more monotonous intonation. ISs follow the patterns of NEs: wider 
range of the F0 in the initial part and narrower range of the F0 in the final 
part. 
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Speech rate 

For the speech rate as a y-axe, the results of the RM analysis of variance 
confirmed a significant effect of group on speech rate, Z 7 ’!2.899) = 358.964. p 
<.001. Tukey’s tests (p < .05) revealed that the phrase duration was shorter 
for the NE group, intermediate for IS, and longer for NIS. Figure 3 (b) 
presents the mean value of duration for the three groups. These results agree 
with previous work in which more native-like speech was produced with a 
faster speech rate (Adams & Munro, 1978; Guion et al., 2000; Lennon, 1990; 
Munro & Derwing, 1995; Sluijter & Van Heuven, 1996). The results showed 
that the immersed group produced sentences with intermediate durations 
between the NIS and NE groups, indicating that immersion has an influence 
on speech rate. 

The reason why NISs produce the slow sentence duration stems from 
failure to control various components of an utterance, including content 
versus function words, stressed versus unstressed syllables, and movement 
durations versus steady-state durations. They tend to produce function words 
and unstressed syllables with higher pitch and longer duration than the NE 
group. However, the immersed group approached the patterns of NE group, 
distinguishing stress/unstressed syllables, strengthening content words, and 
lengthening the pitch contour on focused words. 




(a) F0 range (Hz) (b) Speech rate (S) 

Figure 3. Mean values with standard errors for two acoustic parameters [(a) 
overall F0 range; (b) Speech rate] by three groups of twenty speakers each 
(NE, IS, NIS) 
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Mean FO and duration of Phrase-final foot 

Both mean FO and duration of the phrase-final foot were measured. The RM 
analysis of variance revealed that there was a significant effect of group on 
the duration of the phrase-final word, F( 2,899) = 32.984 ,p <.001. Tukey’s 
tests {p < .05) revealed that the duration was longer for the NIS group than 
the IS and NE groups. See Figure 4. It is striking that the duration of the 
boundary syllable was the longest for NIS speakers, even greater than the 
final lengthening exhibited by English natives (Wightman et al., 1992). 
Vowel insertion may have caused longer duration for the NIS group because 
English loan words ending in a consonant are typically produced with an 
epenthetic vowel [+] (e.g., bus is pronounced as[bAS+]). 

As for the mean value of the FO in the phrase-final foot, the results of the 
analysis of variance returned a significant effect of group, F( 2,899) = 95.553, 
p <.001. Tukey’s tests (p < .05) revealed that the mean value of the 
fundamental frequency was higher for the NIS group, intermediate for IS, 
and lower for NE group. The results are summarized in Figure 3(d). 

These results suggest that the degree of final-strengthening was the 
largest for the NIS group, with the IS group intermediate between the two 
other groups. The final foot in the terminal phrases produced by the NE 
group showed lower F0 and shorter duration, while the NIS group showed 
higher F0 and longer duration. Final strengthening appeared to be stronger 
for the NIS group than for NE or IS groups. 




(a) Duration of phrase-final foot (S) (b) Mean F0 of phrase-final foot (Hz) 

Figure 4. Mean values with standard errors for two acoustic parameters [(a) 
Duration of the phrase-final foot; (b) Mean F0 value of the phrase-final foot] 
by three groups of twenty speakers each (NE, IS, NIS) 

To summarize, it has been observed that a gradual downdrift in the value 
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of FO could be found across a phrase (Lieberman and Pierrehumbert, 1984), 
but the degree is different depending on immersion. NE group had a greater 
FO range, faster speech rate, and lower FO mean and shorter duration of the 
phrase-final syllable. These characteristics of the measured signals led to the 
steepest declination tilt. The IS group was in between NE and NIS on these 
measures. 

Reset cues 

Results of the RM analysis for the inter-phrase cues confirmed a significant 
effect of group for the FO difference between the two iPs, / 7 (2.479) = 3.837. p 
< .05. Tukey’s tests (p < .05) revealed that the pitch difference was the largest 
for the NE group, but lower for the IS and NIS groups. This result indicates 
that both IS and NIS groups have a smaller FO difference between the two 
intermediate phrases. The NE speakers show a greater difference of 16 Hz: 
the falling FO of the first phrase and the higher FO at the onset of the 
following phrase. Both Korean groups, on the other hand, have smaller 
differences, around 4 Hz (see Figure 7(a)). This indicates that although the 
effect of immersion was found to influence L2 intonation in many parameters, 
no effect of immersion was found for some aspects of intonation. NE 
speakers produced a difference in fundamental frequency between terminal 
and non-terminal sentences, while both IS and NIS did not. 

Results of the RM ANOVA showed a significant effect of group on the 
pause duration between the two iPs, F{ 2, 599) = 67.545, < .001. Tukey’s 

tests {p < .05) revealed that the pause duration was longer for the NIS group, 
and shorter for the NE and IS groups (see Figure 7(b)). These results indicate 
that the pause duration for NISs is the longest (roughly 20 ms longer than the 
other groups), while members of the IS group had similar or even shorter 
duration compared to members of the NE group. 




(a) F0 difference between two iPs (b) Pause duration between two iPs 
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Figure 5. Mean values with standard errors for the non-terminal phrases for 
two parameters [(a) FO difference between the two phrases; (b) Pause 
duration between the two phrases] by three groups of twenty speakers each 
(NE, IS, NIS) 

To summarize, non-terminal phrases for members of the NE group had 
greater FO differences and shorter pause durations between phrases than the 
Korean groups. The declination slope was the steepest for NE group as well. 
The acoustic realization for IS group was somewhat varied; the duration of 
the pause between the phrases for the IS was not different from that of the NE 
group, while FO between the phrases was not different from that of the NIS 
group. 

6 Relationship between Intonational Production and Intelligibility Test 

The production analysis reported that the effect of early immersion influences 
L2 learners’ acquisition of English prosody dependently or independently. 
One of the remaining questions on the production tests is to what extent their 
accuracy improvement for the suprasegmental cues could contribute to the 
intelligible judgment. For the analysis, both rating scores of L2 learners of 
Korean and their values of intonational features examined in this study were 
submitted to correlation and regression analyses. Zero-order correlations 
were computed between the learners’ intelligible ratings (n=20) and their 
intonational measured values. 


Table 3. Summary of Correlation Analyses between Intelligible Ratings and 
Acoustic Measurement 



FO 

Speech 

FO at 

Duration 

FO 

Pause 


range 

rate 

boundary 

at 

boundary 

difference 

duration 

Intelligible 

ratings 

-.368* 

-.637** 

-.090 

-.256* 

-.453** 

-.182 


**:pc.OOl, *:p <.05 


The analysis indicates that some of acoustic values measured in this study 
are significantly correlated with the intelligible ratings. ), suggesting that the 
speech rate is a strong predictor of intelligible decision and also FO difference 
is a significant factor in deciding the intelligible resetting pitch contour. In 
summary, the results suggest that native judgment of L2 speech may reflect 
the universal features of intelligible judgment; a strong perceptual effect can 
be existed on the temporal cues such as the speech rate {should revise). 
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7 Discussion 

An immersion experience was found to have some influence on the 
production of L2 intonation in terms of declination tilt and resetting cues. 
Immersed Korean learners of English (IS) exhibited patterns more similar to 
those of native English speakers than the non-immersed Korean learners of 
English (NIS). Namely, they had a steeper declination tilt which includes a 
wider FO range, much lower FO and shorter duration at phrase-final 
boundaries, a faster speech rate, and a shorter duration of pauses. 

Transfer from LI in forming L2 intonation was proved. In the terminal 
sentences (i.e., intonation phrases (IPs) with only one intermediate phrase 
(iP)), NEs had the widest range in fundamental frequency, shortest duration 
of phrases, and a lowest mean FO in the phrase-final foot. These 
characteristics cause the steepest tilt among three groups in both upper-lines 
and lower-lines. On the contrary, NISs showed the smallest range of FO, 
longest duration of phrases, and the highest mean value of FO and longest 
duration in the phrase-final foot. The declination in both upper-lines and 
lower-lines thus showed the gentlest slope for the NIS. The IS group 
members were clearly positioned between the members of the other two 
groups, having intermediate values for these measures. 

The evidence that the early immersion has significant influences on L2 
intonation is clear in forming resetting cues. In the non-terminal sentences 
(i.e., intonation phrases (IPs) with two intermediate phrases (iPs)), NEs 
exhibited the largest difference of FO between end of the first phrase and the 
onset of the second phrase, and the shortest duration of pause between the 
adjacent phrases. On the contrary, NISs showed a comparatively longer 
duration of pause and a smaller difference of FO between the two phrases. 

However, the IS group had some confused results: an FO difference 
similar to the NIS group and a pause duration similar to the NE group. The IS 
and NIS groups were not different from each other in the parameter of FO 
difference. Both Korean groups imposed a similar tendency in not resetting 
the FO between the two intermediate phrases in the non-terminal phrases, 
while English native speakers produced a large difference in FO by lowering 
the FO in the final portion of the first iP and raising FO at the initial point of 
the second iP. It is interesting that the IS group with more than four years of 
immersion as childhood failed to reset the FO between the two phrases, even 
though the duration of the boundary foot for the IS group showed the same 
patterning as the NE group. 

A similar case was found in the declination tilt represented in the terminal 
sentences. In the declination tilt of the intonation phrase, even though the IS 
group had a pattern more similar to the NE for the upper-line, the lower-line 
is clearly more similar to the NIS group. This means that the lower-line that 
connects the lowest point of fundamental frequency in the initial section to 
that in the final segment was not affected very much by immersion in an 
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English-language setting. The immersed learners acquired the international 
pattern of English in terms of final lowering associated with a sense of fading 
off, or finalizing the terminal sentences, but not in the sense of resetting FO in 
the non-terminal phrases. 

Failure to reset the FO may stem from LI interference. The intermediate 
phrase in English (the first iP in the non-terminal stimuli in this paper) is 
marked by a phrase accent, either a high or a low tone associated with the 
final stressed syllable, which affects the FO trajectory at the end of the iP. 
Flowever, the accentual phrase in Korean does not have any phrase accents 
associated with specific syllables. Rather it has tonal patterns associated with 
the entire phrase (see discussion in the Introduction). The difference between 
the languages may make it less likely for Koreans to learn to associate phrase 
accents with the final stressed syllables of English, especially in cases where 
the phrase is in a non-terminal position. The FO difference, along with the 
duration of the pause, is a decisive mark of finality or non-finality. The 
Immersed Korean group’s failure to reset FO between intermediate phrases 
suggests that LI interference plays a larger role in more local, language- 
specific linguistic patterns than in global patterns such as FO declination, FO 
range, and speech rate. 

Thus, immersion of several years duration does not guarantee a native¬ 
like production of intonation. While some aspects were native-like, others 
were not. More specifically, several years of immersion during childhood 
appears to improve speech rate, phrase final lengthening, pause duration, and 
global FO patterns such as declination tilt and FO range. On the other hand, 
more local FO patterns such as the lower-line of the FO contour and FO 
resetting between phrases were found to be less native-like. 

That is, even though immersion was found to have strong, positive effect 
on the acquisition of second language intonation, certain parameters were 
still non-native like, viz. the difficulty in controlling the FO in phrase-final 
syllable and the comparatively weak resetting of FO at the adjacent phrases. 
These differences may cause confusion for listeners in that clear boundary 
cues may be absent. The listener may not know whether the sentence has 
ended and another one has begun. Further work assessing the relative 
intelligibility of speech produced by immersed and non-immersed groups is 
needed to explore this question. 

8 Conclusion 

In sum, the immersed group of English learners was found to have more 
native-like intonation than a non-immersed group of English learners who 
had similar lengths of English instruction and similar scores on written 
English proficiency tests. From these result we can infer a facilitative effect 
of immersion on the production of English intonation in a second language: a 
steeper declination through wider FO range and a faster speech rate. However, 
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some FO-related signals in reset tend to be hard to change in spite of 
immersion in English in childhood: smaller FO difference between phrases. 
The immersed group still holds some production characteristics of intonation 
which most likely stem from the first language, even though they had 
immersed English in an English-speaking country for a significant period 
during childhood. Future research is needed to determine whether these 
characteristic contribute to diminished intelligibility or to the detection of a 
foreign intonation. 
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Appendix 

Experimental sentences 

1. The dogs should have eaten the hotdog. 

2. The driver took the cab to the town. 

3. The player sent the mail to Susan. 

4.1 can’t remember tire scene vividly. 

5. The people built the beautiful bridge. 

6.1 have friends who are just like me. 

7. Miss Janet drank a cup of coffee. 

8. They suspect that the suspect killed Ted. 

9.1 closed the door and waited for the bus. 

10. Jenny walked home from school in the rain. 

11. Thirteen years later, Mary met him at the same place. 

12. Raise your right hand, if the teacher calls your name. 

13. With a light hammer, the carpenter hit tire nail. 

14. All of a sudden, the man rushed to the market. 

15. We went to London, Paris, Cairo, and Boston. 

16. Tire wind and the sun argued one day over which one was the stronger. Spotting 
a man traveling on the road, they sported a challenge to see which one could 
remove the coat from tire man's back the quickest. The wind began. He blew 
strong gusts of air, so strong that the man could barely walk against them. But the 
man clutched his coat tight against him. The wind blew harder and longer, and 
the harder the wind blew, the tighter the man held his coat against him. The wind 
blew until he was exhausted, but he could not remove tire coat from the man's 
back. It was now the sun's turn. He gently sent his beams upon the traveler. The 
sun did very little, but quietly shone upon his head and back until the man became 
so wann that he took off his coat and headed for the nearest shade tree. 
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